skip to main content


Search for: All records

Creators/Authors contains: "Li, Yuanbo"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. An interleaved-Dyck (InterDyck) language consists of the interleaving of two or more Dyck languages, where each Dyck language represents a set of strings of balanced parentheses.InterDyck-reachability is a fundamental framework for program analyzers that simultaneously track multiple properly-matched pairs of actions such as call/return, lock/unlock, or write-data/read-data.Existing InterDyck-reachability algorithms are based on the well-known tabulation technique.

    This paper presents a new perspective on solving InterDyck-reachability. Our key observation is that for the single-source-single-target InterDyck-reachability variant, it is feasible to summarize all paths from the source node to the target node based onpath expressions. Therefore, InterDyck-reachability becomes an InterDyck-path-recognition problem over path expressions. Instead of computing summary edges as in traditional tabulation algorithms, this new perspective enables us to express InterDyck-reachability as aparenthesis-countingproblem, which can be naturally formulated via integer linear programming (ILP).

    We implemented our ILP-based algorithm and performed extensive evaluations based on two client analyses (a reachability analysis for concurrent programs and a taint analysis). In particular, we evaluated our algorithm against two types of algorithms: (1) the general all-pairs InterDyck-reachability algorithms based on linear conjunctive language (LCL) reachability and synchronized pushdown system (SPDS) reachability, and (2) two domain-specific algorithms for both client analyses. The experimental results are encouraging. Our algorithm achieves 1.42×, 28.24×, and 11.76× speedup for the concurrency-analysis benchmarks compared to all-pair LCL-reachability, SPDS-reachability, and domain-specific tools, respectively; 1.2×, 69.9×, and 0.98× speedup for the taint-analysis benchmarks. Moreover, the algorithm also provides precision improvements, particularly for taint analysis, where it achieves 4.55%, 11.1%, and 6.8% improvement, respectively.

     
    more » « less
  2. Many program-analysis problems can be formulated as graph-reachability problems. Interleaved Dyck language reachability ( InterDyck -reachability) is a fundamental framework to express a wide variety of program-analysis problems over edge-labeled graphs. The InterDyck language represents an intersection of multiple matched-parenthesis languages (i.e., Dyck languages). In practice, program analyses typically leverage one Dyck language to achieve context-sensitivity, and other Dyck languages to model data dependencies, such as field-sensitivity and pointer references/dereferences. In the ideal case, an InterDyck -reachability framework should model multiple Dyck languages simultaneously . Unfortunately, precise InterDyck -reachability is undecidable. Any practical solution must over-approximate the exact answer. In the literature, a lot of work has been proposed to over-approximate the InterDyck -reachability formulation. This article offers a new perspective on improving both the precision and the scalability of InterDyck -reachability: we aim at simplifying the underlying input graph G . Our key insight is based on the observation that if an edge is not contributing to any InterDyck -paths, we can safely eliminate it from G . Our technique is orthogonal to the InterDyck -reachability formulation and can serve as a pre-processing step with any over-approximating approach for InterDyck -reachability. We have applied our graph simplification algorithm to pre-processing the graphs from a recent InterDyck -reachability-based taint analysis for Android. Our evaluation of three popular InterDyck -reachability algorithms yields promising results. In particular, our graph-simplification method improves both the scalability and precision of all three InterDyck -reachability algorithms, sometimes dramatically. 
    more » « less
  3. Dyck-reachability is a fundamental formulation for program analysis, which has been widely used to capture properly-matched-parenthesis program properties such as function calls/returns and field writes/reads. Bidirected Dyck-reachability is a relaxation of Dyck-reachability on bidirected graphs where each edge u → ( i v labeled by an open parenthesis “( i ” is accompanied with an inverse edge v → ) i u labeled by the corresponding close parenthesis “) i ”, and vice versa. In practice, many client analyses such as alias analysis adopt the bidirected Dyck-reachability formulation. Bidirected Dyck-reachability admits an optimal reachability algorithm. Specifically, given a graph with n nodes and m edges, the optimal bidirected Dyck-reachability algorithm computes all-pairs reachability information in O ( m ) time. This paper focuses on the dynamic version of bidirected Dyck-reachability. In particular, we consider the problem of maintaining all-pairs Dyck-reachability information in bidirected graphs under a sequence of edge insertions and deletions. Dynamic bidirected Dyck-reachability can formulate many program analysis problems in the presence of code changes. Unfortunately, solving dynamic graph reachability problems is challenging. For example, even for maintaining transitive closure, the fastest deterministic dynamic algorithm requires O ( n 2 ) update time to achieve O (1) query time. All-pairs Dyck-reachability is a generalization of transitive closure. Despite extensive research on incremental computation, there is no algorithmic development on dynamic graph algorithms for program analysis with worst-case guarantees. Our work fills the gap and proposes the first dynamic algorithm for Dyck reachability on bidirected graphs. Our dynamic algorithms can handle each graph update ( i.e. , edge insertion and deletion) in O ( n ·α( n )) time and support any all-pairs reachability query in O (1) time, where α( n ) is the inverse Ackermann function. We have implemented and evaluated our dynamic algorithm on an alias analysis and a context-sensitive data-dependence analysis for Java. We compare our dynamic algorithms against a straightforward approach based on the O ( m )-time optimal bidirected Dyck-reachability algorithm and a recent incremental Datalog solver. Experimental results show that our algorithm achieves orders of magnitude speedup over both approaches. 
    more » « less
  4. null (Ed.)
    Many program analyses need to reason about pairs of matching actions, such as call/return, lock/unlock, or set-field/get-field. The family of Dyck languages { D k }, where D k has k kinds of parenthesis pairs, can be used to model matching actions as balanced parentheses. Consequently, many program-analysis problems can be formulated as Dyck-reachability problems on edge-labeled digraphs. Interleaved Dyck-reachability (InterDyck-reachability), denoted by D k ⊙ D k -reachability, is a natural extension of Dyck-reachability that allows one to formulate program-analysis problems that involve multiple kinds of matching-action pairs. Unfortunately, the general InterDyck-reachability problem is undecidable. In this paper, we study variants of InterDyck-reachability on bidirected graphs , where for each edge ⟨ p , q ⟩ labeled by an open parenthesis ”( a ”, there is an edge ⟨ q , p ⟩ labeled by the corresponding close parenthesis ”) a ”, and vice versa . Language-reachability on a bidirected graph has proven to be useful both (1) in its own right, as a way to formalize many program-analysis problems, such as pointer analysis, and (2) as a relaxation method that uses a fast algorithm to over-approximate language-reachability on a directed graph. However, unlike its directed counterpart, the complexity of bidirected InterDyck-reachability still remains open. We establish the first decidable variant (i.e., D 1 ⊙ D 1 -reachability) of bidirected InterDyck-reachability. In D 1 ⊙ D 1 -reachability, each of the two Dyck languages is restricted to have only a single kind of parenthesis pair. In particular, we show that the bidirected D 1 ⊙ D 1 problem is in PTIME. We also show that when one extends each Dyck language to involve k different kinds of parentheses (i.e., D k ⊙ D k -reachability with k ≥ 2), the problem is NP-hard (and therefore much harder). We have implemented the polynomial-time algorithm for bidirected D 1 ⊙ D 1 -reachability. D k ⊙ D k -reachability provides a new over-approximation method for bidirected D k ⊙ D k -reachability in the sense that D k ⊙ D k -reachability can first be relaxed to bidirected D 1 ⊙ D 1 -reachability, and then the resulting bidirected D 1 ⊙ D 1 -reachability problem is solved precisely. We compare this D 1 ⊙ D 1 -reachability-based approach against another known over-approximating D k ⊙ D k -reachability algorithm. Surprisingly, we found that the over-approximation approach based on bidirected D 1 ⊙ D 1 -reachability computes more precise solutions, even though the D 1 ⊙ D 1 formalism is inherently less expressive than the D k ⊙ D k formalism. 
    more » « less